arXiv: 1503.00572V1 [math.CO] 27 Feb 2015 


Mode Poset Probability Polytopes 


Guido Montiifar^ and Johannes Rauh^ 

^ Max Planck Institute for Mathematics in the Sciences, 
Inselstrafie 22, 04103 Leipzig, Germany, 
montuf ar@mis.mpg.de, 

^ Leibniz Universitat Hannover, 
Welfengarten 1, 30167 Hannover, Germany, 
rauh@math.uni-hannover.de 


Abstract. A mode of a probability vector is a local maximum with 
respect to some vicinity structure on the set of elementary events. The 
mode inequalities cut out a polytope from the simplex of probability 
vectors. Related to this is the concept of strong modes. A strong mode of 
a distribution is an elementary event that has more probability mass than 
all its direct neighbors together. The set of probability distributions with 
a given set of strong modes is again a polytope. We study the vertices, 
the facets, and the volume of such polytopes depending on the sets of 
(strong) modes and the vicinity structures. 


1 Introduction 

Many probability models used in practice are given in a parametric form. Some¬ 
times it is useful to also have an implicit description in terms of properties that 
characterize the probability distributions that belong to the model. Such a de¬ 
scription can be used to check whether a given probability distribution lies in 
the model or, otherwise, to estimate how far it lies from the model. For example, 
if a given model has a parametrization by polynomial functions, then one can 
show that it has a semialgebraic description; that is, an implicit description as 
the solution set of polynomial equations and polynomial inequalities. Finding 
this description is known as the implicitization problem, which in general is very 
hard to solve completely. Even if it is not possible to give a full implicit descrip¬ 
tion, it may be possible to confine the model by simple polynomial equalities and 
inequalities. Here we are interested in simple confinements, in terms of natural 
classes of linear equalities and inequalities. 

We consider polyhedral sets of discrete probability distributions defined by 
prescribed sets of modes. A mode is a local maximum of a probability vector. 
Locality is with respect to a given a vicinity structure in the set of coordinate 
indices; that is, x is a (strict) mode of a probability vector p if and only if 
Px > Py, for all neighbors y of x. The vicinity structure depends on the setting. 
For probability distributions on a set of fixed-length strings, it is natural to 
call two strings neighbors if and only if they have Hamming distance one. For 
probability distributions on integer intervals, it is natural to call two integers 



neighbors if and only if they are consecutive. In general, a vicinity structure is 
just a graph with undirected edges. 

Modes are important characteristics of probability distributions. In partic¬ 
ular, the question whether a probability distribution underlying a statistical 
experiment has one or more modes is important in applications. Also, many 
statistical models consist of “nice” probability distributions that are “smooth” 
in some sense. Such probability distributions have only a limited number of 
modes. Another motivation for studying modes was given in [2], where it was 
observed that mode patterns are a practical way to differentiate between certain 
parametric model classes. 

Besides from modes, we are also interested in the related concept of strong 
modes introduced in [5]. A point a; is a (strict) strong mode of a probability 
distribution p if and only if Pyi where the sum runs over all neighbors 

y oi X. Strong modes offer similar possibilities as modes for studying models of 
probability distributions. While strong modes are more restrictive than modes, 
they are easier to study. 

One observation is: Suppose that p = Aip* is a mixture of k probability 
distributions. If p has a strict strong mode x G V, then x must be a mode of 
one of the distributions p*, because if p’‘{x) < p^{yi) for some neighbor yi of 
X for all i, then < Y^i^iP^iVi) < Ly-x L* For example, a 

mixture of k uni-modal distributions has at most k strong modes. Surprisingly, 
the same statement is not true for modes: A mixture of k product distributions 
may have more than k modes [5]. Still, the number of modes of a mixture of 
product distributions is bounded, although this bound is not known in general. 
As another example, in [5] it was shown that a restricted Boltzmann machine 
with m hidden nodes and n visible nodes, where m < n and m is even, does 
not contain probability distributions with certain patterns of 2™ strict strong 
modes. 

In this paper we derive essential properties of (strong) mode polytopes, de¬ 
pending on the vicinity structures and the considered patterns of (strong) modes. 
In particular, we describe the vertices, the facets, and the volume of these poly¬ 
topes. It is worth mentioning that mode probability polytopes are closely related 
to order and poset polytopes. We describe this relation at the end of Section]^ 

This paper is organized as follows: In Section we study the polytopes of 
modes and in Section the polytopes of strong modes. 

2 The polytope of modes 

We consider a finite set of elementary events V and the set of probability distri¬ 
butions on this set, A{V). We endow V with a vicinity structure described by a 
graph. Let G = (V,E) be a simple graph (i.e., no multiple edges and no loops). 
For any x,y £ if (x, y) G E is an edge in G, we write x ^ y. Since we assume 
that the graph is simple, x ^ y implies x ^ y. 


G : 



p , Of 



Fig. 1. Above: The graph G from Examples and with C marked in gray. Below: 
The corresponding polytopes M(G,C) and S{G,C). Each vertex of these polytopes is a 
uniform distribution supported on a subset of G, as explained in Propositions and 


Definition 1. A point x € V is a mode of a probability distribution p € ^(V) 
if Px > Py for all y ^ X. 

Definition 2. Consider a subset C C V. The polytope of C-modes in G is the 
set M(G,C) of all probability distributions p G A{V) for which every x G C is a 
mode. 

The set M(G,C) is always non-empty, since it contains the uniform distribution. 
It is a polytope, because it is a closed convex set defined by finitely many linear 
inequalities and, as a subset of A{V), it is bounded. We are interested in the 
properties of this polytope, depending on G and C. 

Recall that a set of vertices of a graph is independent^ if it does not contain two 
adjacent elements. If C is not independent, then M(G,C) is not full-dimensional 
as a subset of A{V)] that is, dimM(G, C) < dim(Z\(R)) = |R| —1. For, iix,y gC 
are neighbors, then the defining equations of M(G,C) imply that px > Py > Px', 
that is, any p G M(G,C) satisfies Px = Py In the following we will ignore this 
degenerate case and assume that the set of modes is independent. 

In some applications, for example those mentioned in the introduction, it is 
more natural to study strict modes; i.e. points x G V with px > Py for all y x. 
A description of the set of distributions with prescribed strict modes is easy to 
obtain from a description of M(G,C). 

Example 1. Let G be a square with vertices V = {00, 01,10,11} and edges E = 
1(00,01), (00,10), (01,11), (10,11)}. The polytope M(G,C) for C = (01,10} is 
given in Figure 











Vertices. We have defined M(G,C) by linear inequalities (H-representation). 
Next we determine its vertices (V-representation). For any non-empty W C V\C 
and y & V write j/ ~ VF if y ~ a; for some x G W. Moreover, let Nc{W) = {y G 
C ■. y ^ W} (this is the set of declared modes which are neighbors of W), and 
let be the uniform distribution on Nc{W) U W. 

Proposition 1. 

1. M(G, C) is the convex hull of {e^ : 07 ^ 1 VCV\C}U : x G C}, where 

6x denotes the point distribution concentrated on x. 

2. For any x G C, the distribution 6x is a vertex o/M(G,C). 

3. e^ is a vertex o/M(G, C) iff for any x,y G W, x ^ y, there is a path x = 

xo ^ xi ^ ^ Xr = y in G with xq, X2, ■ ■ ■ G W and xi,X 3 , ■ ■ ■ G Nc{W). 

Proof. Clearly, for every non-empty W C V \ C, the vector e^ belongs to 
M(G,C), and the same is true for the vectors 6x with a; G C (C is independent). 
Next we show that each p G M(G,C) can be written as a convex combination 
of {e^ : 9 ^ W CV\C}U : x G C}. We do induction on the cardinal¬ 
ity of W := supp(p) \ C. If |IF| = 0, then p G 2\{C) is a convex combination of 
{6x ■ X G C}. Now assume |IF| > 0. Let A = va.m.{px : x G W}. Then, p—Xe^ > 0 
(component-wise) and J2xiPx ~ (^)) = ~ Therefore, 

P' ■= € ^{V). 

Moreover, one checks that p' G M(G, C). By definition, supp(p')\C C supp(p)\C. 
By induction, supp(p') is a convex combination of {e^ : 07 ^IFCy\C}U {5x : 
a; G C}, and so the same is true for p. 

It remains to check which elements of {e^ : 0 7 ^ IF C V\C}U{(5a; : x gC} are 
vertices of M(G, C). Since 5x is a vertex of A{V), it is also a vertex of M(G,C). 
Let IF C V\C be non-empty. Call a path such as in the statement of the proposi¬ 
tion an alternating path. Suppose that there is no alternating path from xtoy for 
some x,y G W. Let IFi = {2 G IF : There is an alternating path from x to z} 
and let IF 2 = IF \ IFi. Then IFi,IF 2 are non-empty, and Nc(Wi) C iVc(IF 2 ) 
is empty. Hence is a convex combination of e^^ and , and is not a 
vertex. 

Let IF be a non-empty subset of F \ C such that any pair of elements of IF is 
connected by an alternating path. To show that e^ is a vertex, for any different 
non-empty set IF' C F\C we need to find a face of M(G, C) that contains e)T but 
not e)T . If there exists x G IF'\IF, then (x) > 0 = (x). Hence, lies on 

the face of M(G, C) defined by Px > 0, but e^' does not. Otherwise, IF' C IF. 
Let x' G IF \ IF' and y' G IF' 7 ^ 0. By assumption, there exists an alternating 
path from x' to y' in IF. On this path, there exist x G IF \ IF' and y G C with 
y ~ X and y G Nc(W'). Therefore, (y) — (x) > 0 = e^ (y) — (x). □ 

Corollary 1. M(G,C) is a full-dimensional sub-polytope of A(V). 

Proof. The convex hull of {6x : x G C} U : y G F \ C} is a (|F| — l)-simplex 
and a subset of M(G,C). □ 



Facets. M(G,C) is defined, as a subset of A{V), by the inequalities 


Px > 0, for all cc S F, 

Px > Py, for all cc S C and y ~ x. 


(positivity inequalities) 
(mode inequalities) 


Next we discuss, which of these inequalities define facets. 

Proposition 2. 

1. For any x G V \ C, the positivity inequality Px > 0 defines a facet. 

2. If X G C, then Pa, > 0 defines a facet iff x is isolated in G. 

3. For any x G C and j/ ~ x, the mode inequality Px > Py defines a facet. 

Proof. 1. The inequality Px > 0 defines a facet of the subsimplex from the proof 
of Corollary and hence also of M(G,C). 

2. If X is isolated, then x is a mode of any distribution. Therefore, M(G, C) = 
M(C \ {x}), and the statement follows from 1 . 

Otherwise, suppose there exists y G V with x ^ y. Since C is independent, 
y C. Then px = (px — Py) + Py; that is, the inequality Px > 0 is implied by 
the inequalities Px > Py and Py > 0 , and Px > 0 defines a sub-face of the facet 
Py ^ 0, which is a strict sub-face, since it does not contain bx. Therefore, Px > 0 
does not define a facet itself. 

3. Let W •.= {z G C ■. z y} \ {x}. The uniform distribution on IT U {y} 

satisfies all defining inequalities of M(G,C), except Px > Py. □ 

Triangulation and volume. The polytope M(G,C) has a natural triangu¬ 
lation that comes from a natural triangulation of ^(V). Let N = \V\ be the 
cardinality of V. For any bijection cr : {1,... ,N} —)• V let 

= {p e A{V) : p^(^i-) < Pa{i+i) for f = 1 ,..., fV - 1 }. 

Clearly, the form a triangulation of A{V). In particular, A{V) = IJ^ A^ and 
vo1(Z\ct U A^i) = vol(Z\o-) -I- vo\{A„i) whenever a 7 ^ a'. 

Lemma 1. Let E{G,C) be the set of all bijections a : {1,..., N} —>■ V that sat¬ 
isfy a~^{x) < a~^{y) for all y G C and x ~ p. Then M(G,C) = U(tgX'(GC) 

Proof, li a G E and p G A^, then p € M(G,C) by definition. Conversely, let 
p G M(G,C). Choose a bijection cr : {1,..., N} —)• V that satisfies the following: 

1- P<t(j+i) > Pa{i) for i = 1,..., - 1, 

2. If X € C and p ~ x, then a~^{x) < a~^{y). 

Clearly, a G E, and p € A^. □ 


Corollary 2. vol(M(G,C)) = vol(Z\(C)). 

Proof. All simplices zio- have the same volume. Moreover, \o\{A^ C A„i) = 0 for 
cr 7 ^ cr'. Thus, vol(M(G,C)) = | Al vol(Z\o.) and vol(Z\(F)) = |T|! vo1(Z\ct). □ 


It remains to compute the cardinality of S{G, C). It is not difficult to enumer¬ 
ate S(G,C) by iterating over the set V. However, IJ{G,C) may be a very large, 
and so, enumerating it can take a very long time. In fact, this is a special instance 
of the problem of counting the number of linear extensions of a partial order (see 
below); a problem which in many cases is known to be ^P-complete In our 
case, a simple lower bound is \S{G,C)\ > |C|!|H \C|! (equality holds only when 
G is a complete bipartite graph and C is one of the maximal independent sets). 

Relation to order polytopes. The results in this section can also be derived 
from results about order polytopes. To explain this, it is convenient to slightly 
generalize our settings. Instead of looking at a graph G and an independent 
subset C of nodes, consider a partial order > on V and let 

M(^) := {p G A{V) '■ Px> Py whenever x h y}- 

The polytope M(G, C) arises in the special case where ^ is defined by 

X y y X ^ y and x G C. 

The relation ^ defined in this way from G and C is a partial order precisely if C is 
independent. Our results about vertices, facets and volumes directly generalize 
to M(^). We omit further details at this point. 

The order polytope of a partial order arises by looking at subsets of the unit 
hypercube instead of subsets of the probability simplex (see [3] and references): 

G(^) := {p G [0, l]'^ ■ Px> Py whenever x h y}- 

One can show that M(^) is the vertex figure of G(h) at the vertex 0. This 
observation allows to transfer the results from [3] to M(G,C). 

3 The polytope of strong modes 

Definition 3. A point x G V is a strong mode of a probability distribution 
pG A{V) ifpx > J^yr^xPy 

Definition 4. Consider a subset C C V. The polytope of strong C-modes in G 
is the set S(G,C) all probability distributions p G A{V) for which every x G C is 
a strong mode. 

Again, in applications one may be interested in strict strong modes that are 
characterized by strict inequalities of the form px > J2yr.^xPv 

li X ^ y for two strong modes of p S A{V), then p^ = Py and Pz = 0 for 
all other neighbors 0 of x or y. In order to avoid such pathological cases, in the 
following we always assume that C is an independent subset of G. 

Example 2. Consider the graph from Example[^ For C = {01,10}, the polytope 
S(G,C) is given in Figure 


Again, we are interested in the vertices of the polytope 8(0, C). For any 
X & V \eX Nc{x) = {y € C : y x} (this is the set of strong modes which are 
neighbors of x) and let /(f be the uniform distribution on Nc{x) U {x}. 

Proposition 3. IJ C is independent, then S{G,C) is a (|P| — l)-simplex with 
vertices , x £V. 

Proof. To see that {f^ : x S P} is linearly independent, observe that the matrix 
with columns /(f is in tridiagonal form when V is ordered such that the vertices 
in C come before the vertices in V \C. Therefore, the probability distributions 
/q span a (|P| — l)-dimensional simplex. 

It is easy to check that S S(G',C) for any x S P. It remains to prove 
that any p S S(G',C) lies in the convex hull of {/(f : x S V}. We do induction 
on the cardinality of W := supp(p) \ C. If |IF| = 0, then p € A(C) is a convex 
combination of {5x : x S C} = {/^ : x G C}. Otherwise, let x G W. Then 

P' ■= T-^— (P-Pxfc) G A(V), 

-L Px 

since p G M(G,C). Moreover, p' G M(G,C). The statement now follows by 
induction, since supp(p') \ C = IF \ {x}. □ 

Proposition 4. The facets of S(G,C) are p^ > '^yr.^^Py all x £ C and 
Px P 0 for all X G V \ C. 

Proof. It is easy to verify that each of the faces defined by these inequalities 
contains \V\ — I vertices. □ 

Proposition 5. vol(S(G,C)) =(11 |jVc(x)| + l ) 

Proof. After rearrangement of columns, the matrix 

{fc)xGV = (^{Sx)xGC, (|Arc(s)| + llAfcO))^^^^^ i {Sx)xeV\C,x^C^ 

is in upper triangular from, with diagonal elements x gV. The state¬ 
ment now follows from the next Lemma [21 □ 

Lemma 2. Let A = convjeo,..., e^} be the standard d-simplex in and let 
So, ... , Sd & A. Then the d-volume of S = convjso, ..., Sd} satisfies 

vol(5') = |det(so,---,Sd)|vol(A). 

Proof. The (d -I- l)-volume of the parallelepiped spanned by sq,. .. ,Sd G 
is I det(so, • ■ ■, Sd)]. The volume of an n-simplex with vertices vq, ... ,Vn in K" 
is ^|det(xi — vo,...,Vn — uo)|. Hence the volume of the {d + I)-simplex P 
with vertices (0, so, ■ ■ •, Srf) is vol(P) = det(so,..., Sd)|. Note that P is a 

pyramid over S of height h = Thus vol(P) = vol(S'). The volume of 

the regular d-simplex is vol(Z\) = The statement follows by combining 

these formulas. □ 









Example 3. Generalizing Examples [T] and let G be the edge graph of an n- 
cube, such that V = {0,1}" and two points are adjacent if their Hamming 
distance is one. 

a) If C C E has cardinality \C\ = k and minimum distance 3, then S has 2" 
vertices and volume vol(S) = vol(Z\), whereas M has ^(2" — 1) + 2" — fcn 
vertices and volume vol(M) = ^ vol(Z\) > fc!2“^" vol(Z\). 

b) If C is the set of all even-parity strings, then S has 2"’ vertices and volume 

vol(S) = {n + 1)“^" ^ vol(Z\), whereas M has 2^" — 1 -1- 2"“^ vertices and 

volume vol(M) = p] vol(Z\) > ( 2 ^- 1 ) vol(Z\). For n = 2 and n = 3 we have 
|i7| = 4 and jill = 720. The next open case is n = 4. 
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